Minimum mean squared error based warped complex cepstrum analysis for statistical parametric speech synthesis
نویسندگان
چکیده
This paper presents an approach for complex cepstrum analysis based on the minimum mean squared error criterion, and describes its application to statistical parametric speech synthesis. The proposed method alleviates some of the issues associated with conventional complex cepstrum analysis, such as choice of the window, phase unwrapping, and the need for accurate pitch marks. Given initial estimates of warped complex cepstra and respective analysis instants, the method iteratively optimizes the complex cepstrum on a warped quefrency domain by minimizing the mean squared error between the natural and the reconstructed speech waveforms. When applied to statistical parametric speech synthesis, the optimized complex cepstrum results in better performance in terms of synthesized speech quality, specially for emotional databases, when compared with the complex cepstrum calculated through conventional methods.
منابع مشابه
Speaker dependent model order selection of spectral envelopes
This work introduces a maximum-likelihood based model order (MO) selection technique for spectral envelopes to apply speaker dependent adaptation in the feature-space similar to vocal tract length normalization. Speech recognition systems based on spectral envelopes are using a fixed MO for the underlying linear parametric model. Using a fixed MO over different speakers or channels might not be...
متن کاملCepstrum-domain acoustic feature compensation based on decomposition of speech and noise for ASR in noisy environments
This paper presents a set of acoustic feature pre-processing techniques that are applied to improving automatic speech recognition (ASR) performance on noisy speech recognition tasks. The principal contribution of this paper is an approach for cepstrum-domain feature compensation in ASR which is motivated by techniques for decomposing speech and noise that were originally developed for noisy sp...
متن کاملMinimum trajectory error training for deep neural networks, combined with stacked bottleneck features
Recently, Deep Neural Networks (DNNs) have shown promise as an acoustic model for statistical parametric speech synthesis. Their ability to learn complex mappings from linguistic features to acoustic features has advanced the naturalness of synthesis speech significantly. However, because DNN parameter estimation methods typically attempt to minimise the mean squared error of each individual fr...
متن کاملAcoustic feature compensation based on decomposition of speech and noise for ASR in noisy environments
This paper presents a set of acoustic feature pre–processing techniques that are applied to improving automatic speech recognition (ASR) performance on the Aurora 2 noisy speech recognition task. The principal contribution of this paper is an approach for cepstrum domain feature compensation in ASR which is motivated by techniques for decomposing speech and noise that were originally developed ...
متن کاملReducing Mismatch in Training of DNN-Based Glottal Excitation Models in a Statistical Parametric Text-to-Speech System
Neural network-based models that generate glottal excitation waveforms from acoustic features have been found to give improved quality in statistical parametric speech synthesis. Until now, however, these models have been trained separately from the acoustic model. This creates mismatch between training and synthesis, as the synthesized acoustic features used for the excitation model input diff...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2013